-
Notifications
You must be signed in to change notification settings - Fork 143
Configure Agent #524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configure Agent #524
Conversation
|
You have successfully added a new Trivy configuration |
internal/events/handler.go
Outdated
| if err != nil { | ||
| return err | ||
| h.cfg.Logger.Error(err, "error storing dataplane configuration") | ||
| // FIXME(kate-osborn): Update status to indicate that the gateway is not accepted or programmed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pleshakov how do you feel about making this task a separate story to reduce the scope of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes sense to me!
| c.logger.Info("Received agent connect request", "message ID", cmd.GetMeta().GetMessageId()) | ||
| c.logger.Info("Received agent connect request") | ||
|
|
||
| c.logger = c.logger.WithValues("podName", req.GetMeta().DisplayName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pleshakov can you think of any reason adding a value to the logger during runtime would cause a problem? I think this is ok, but I might be missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a data race (multiple go routings has access to c.logger and at least one of them is writing)
perhaps we can refactor the connection so that when it starts, it always waits for the connect to finish, it then extracts the pod name, updates the logger and starts the connection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we can refactor the connection so that when it starts, it always waits for the connect to finish, it then extracts the pod name, updates the logger and starts the connection?
Intriguing idea. I'll play around with it.
| ) | ||
|
|
||
| const ( | ||
| // TODO: do we need another file mode for config files? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pleshakov thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we expect manual (not from the agent changes to files)? if not, I think we should all (conf and secrets) of them as 0o400
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we don't expect manual changes. I'll change to 400
internal/state/secrets/secrets.go
Outdated
| // WriteAllRequestedSecrets writes all requested secrets to disk. | ||
| WriteAllRequestedSecrets() error | ||
| // GetAllRequestedSecrets returns all request secrets as Files. | ||
| GetAllRequestedSecrets() []File |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left the WriteAllRequestedSecrets method because I'm toying with the idea of implementing the ConfigStorer, Subject, and Observer interfaces for configuring an nginx instance in the same Pod. That would allow us to benchmark the two approaches. What do you think?
Either way, I still need to rename this interface, or separate the Get functionality from the Write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would allow us to benchmark the two approaches.
what kind of benchmarks do you have in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure yet... it would probably be an e2e test where we benchmark the time it takes to update the config in the dataplane.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I'm worried about lock contention. If we have many agents, will we create a bottleneck in the ConfigStore, with each agent trying to get the latest config?
is there a way to quickly test it?
We could address this by using the push variation of the Observer pattern. In this variation, the ConfigStore would send the latest config on every Notify() call to all its observers (agent connections). This would move the locking from the ConfigStore to the individual agent connections. Each connection would save the latest config and queue a config update. This would require a lock for the config (could use atomic.Value) to prevent a race condition between a connection reader and writer. This variation could make the connection logic more complex but could be better overall.
I believe the current approach has a locking problem (mentioned it in an inline comment), so additional locking logic is required either way.
Perhaps another approach here could be for agents to create a one-time watch. Something like below (note: canclelation is omitted for simplicity):
var latestCfgID int
for {
cfg <- configStore.GetConfigWithGreaterID(latestCfgID)
push(cfg)
latestCfgID = cfg.ID
}That will also remove the need for Register and Remove methods.
internal/events/handler.go
Outdated
| if err != nil { | ||
| return err | ||
| h.cfg.Logger.Error(err, "error storing dataplane configuration") | ||
| // FIXME(kate-osborn): Update status to indicate that the gateway is not accepted or programmed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes sense to me!
| c.logger.Info("Received agent connect request", "message ID", cmd.GetMeta().GetMessageId()) | ||
| c.logger.Info("Received agent connect request") | ||
|
|
||
| c.logger = c.logger.WithValues("podName", req.GetMeta().DisplayName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a data race (multiple go routings has access to c.logger and at least one of them is writing)
perhaps we can refactor the connection so that when it starts, it always waits for the connect to finish, it then extracts the pod name, updates the logger and starts the connection?
| ) | ||
|
|
||
| const ( | ||
| // TODO: do we need another file mode for config files? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we expect manual (not from the agent changes to files)? if not, I think we should all (conf and secrets) of them as 0o400
| c.sendConfigApplyResponse(ctx, car) | ||
| } | ||
|
|
||
| func (c *connection) sendConfigApplyResponse(ctx context.Context, response configApplyResponse) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"send" is a bit misleading here, because we're not sending anything to the agent
| } | ||
| } | ||
|
|
||
| func (u *NginxConfigBuilder) generateZippedFile(dir directory) (*proto.ZippedFile, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're not using the receiver u at all inside the function, right?
internal/nginx/agent/config_store.go
Outdated
| // ConfigStore implements the observer.Subject interface, | ||
| // so that it can notify the agent observers when the configuration changes. | ||
| // ConfigStore is thread-safe. | ||
| type ConfigStore struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense for this type to be more generalized? so that it doesn't deal with the NGK specific configuration. This way, it can potentially be reused in other projects, to configure a set of data plane replicas with the same configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would that work? Would we just make this generic and have it store and return the any type?
I'm worried about extracting the building piece of this. I only want to build the nginx agent config once because it involves costly operations like checksums.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I could perform the dataplane.Configuration -> agent.NginxConfiguration before storing the config and notifying the observers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I could perform the dataplane.Configuration -> agent.NginxConfiguration before storing the config and notifying the observers.
yep
internal/nginx/agent/config_store.go
Outdated
| } | ||
|
|
||
| // Notify notifies all registered observers. | ||
| func (a *ConfigStore) Notify() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this method need to be public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, my bad
|
@pleshakov this is ready for another review -- still a draft. Since it's a significant refactor and it's been a while since your last review, would it be easier to open a fresh PR? |
that makes sense |
|
Will reopen |
This is a draft PR for pushing nginx configuration to connected agents.
This is not ready for a full review, some tests are broken, and others are missing altogether. There are still some open questions that are tagged as TODOs and others that I will call out in the comments. I would appreciate any feedback/opinions on these as you see them.
I'm looking for feedback on the overall design. The agent configuration is implemented using the Observer design pattern, specifically, the pull variation of Observer. The pull variation forces the ConfigStore to be thread-safe, as there will be many readers (agent connections) and one writer. This variation simplifies the update logic for agent connections because they pull the latest config whenever they are done waiting for a previous config apply to complete. However, I'm worried about lock contention. If we have many agents, will we create a bottleneck in the ConfigStore, with each agent trying to get the latest config? We could address this by using the push variation of the Observer pattern. In this variation, the ConfigStore would send the latest config on every Notify() call to all its observers (agent connections). This would move the locking from the ConfigStore to the individual agent connections. Each connection would save the latest config and queue a config update. This would require a lock for the config (could use atomic.Value) to prevent a race condition between a connection reader and writer. This variation could make the connection logic more complex but could be better overall.